AITopics | apple silicon

Collaborating Authors

apple silicon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Production-Grade Local LLM Inference on Apple Silicon: A Comparative Study of MLX, MLC-LLM, Ollama, llama.cpp, and PyTorch MPS

Rajesh, Varun, Jodhpurkar, Om, Anbuselvan, Pooja, Singh, Mantinder, Jallepali, Ashok, Godbole, Shantanu, Sharma, Pradeep Kumar, Shrivastava, Hritvik

arXiv.org Artificial IntelligenceNov-11-2025

We present a systematic, empirical evaluation of five local large language model (LLM) runtimes on Apple Silicon: MLX, MLC-LLM, llama.cpp, Ollama, and PyTorch MPS. Experiments were conducted on a Mac Studio equipped with an M2 Ultra processor and 192 GB of unified memory. Using the Qwen-2.5 model family across prompts ranging from a few hundred to 100,000 tokens, we measure time-to-first-token (TTFT), steady-state throughput, latency percentiles, long-context behavior (key-value and prompt caching), quantization support, streaming performance, batching and concurrency behavior, and deployment complexity. Under our settings, MLX achieves the highest sustained generation throughput, while MLC-LLM delivers consistently lower TTFT for moderate prompt sizes and offers stronger out-of-the-box inference features. llama.cpp is highly efficient for lightweight single-stream use, Ollama emphasizes developer ergonomics but lags in throughput and TTFT, and PyTorch MPS remains limited by memory constraints on large models and long contexts. All frameworks execute fully on-device with no telemetry, ensuring strong privacy guarantees. We release scripts, logs, and plots to reproduce all results. Our analysis clarifies the design trade-offs in Apple-centric LLM deployments and provides evidence-based recommendations for interactive and long-context processing. Although Apple Silicon inference frameworks still trail NVIDIA GPU-based systems such as vLLM in absolute performance, they are rapidly maturing into viable, production-grade solutions for private, on-device LLM inference.

large language model, machine learning, throughput, (20 more...)

arXiv.org Artificial Intelligence

2511.05502

Genre: Research Report (1.00)

Industry: Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating Sparse Ternary GEMM for Quantized ML on Apple Silicon

Lipshitz, Baraq, Melone, Alessio, Maraziaris, Charalampos, Bilal, Muhammed

arXiv.org Artificial IntelligenceOct-15-2025

Sparse Ternary General Matrix-Matrix Multiplication (GEMM) remains under-optimized in existing libraries for Apple Silicon CPUs. We present a Sparse Ternary GEMM kernel optimized specifically for Apple's M-series processors. We propose a set of architecture-aware optimizations, including a novel blocked and interleaved sparse data format to improve memory locality, strategies to increase Instruction-Level Parallelism (ILP), and NEON-based Single Instruction Multiple Data (SIMD) vectorization to exploit data-level parallelism. Our scalar implementation achieves up to a 5.98x performance increase over a traditional Ternary Compressed Sparse Column (TCSC) baseline for large matrices with 50% ternary nonzero values (sparsity), reaching up to a 50.2% of the processor's theoretical peak performance, and remains stable across varying sparsity levels. Our vectorized implementation delivers up to a 5.59x performance increase for large matrices with 25% sparsity, and remains stable across varying sparsity levels.

artificial intelligence, implementation, sparsity, (15 more...)

arXiv.org Artificial Intelligence

2510.06957

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model

Chen, Mu-Chi, Huang, Po-Hsuan, Ke, Xiangrui, Tu, Chia-Heng, Xue, Chun Jason, Hung, Shih-Hao

arXiv.org Artificial IntelligenceJul-1-2025

Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) with significant advancements such as OpenAI's ChatGPT, Meta's Llama, and Databricks' DBRX. This paper addresses the cost and scalability challenges encountered when constructing private LLM systems for personal or small group services, as aimed by Apple Intelligence. A Mac Studio cluster with Apple's M2 Ultra chips is established as a cost-efficient solution to host and accelerate the pretrained DBRX model with the Mixture-of-Experts (MoE) architecture. Our performance analysis reveal that parallel execution of the model's experts across two to four machine nodes significantly reduces inference time. We find that computation time for the experts is comparable to the communication time for exchanging their outputs, emphasizing the importance of network latency over bandwidth. We also observe significant management overhead due to Apple software stack's memory management logic. Based on these findings, we develop optimization schemes to eliminate the memory management overhead. As a result, the Mac Studio cluster is 1.15 times more cost-efficient than the state-of-the-art AI supercomputer with NVIDIA H100 GPUs. In addition, we construct a performance model to estimate system performance under varying configurations, and the model provides valuable insights for designing private LLM systems.

large language model, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3649601.3698722

2506.23635

Country:

Europe > Italy (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

Gu, Yile, Kadekodi, Rohan, Nguyen, Hoang, Kamahori, Keisuke, Liu, Yiyu, Kasikci, Baris

arXiv.org Artificial IntelligenceJun-24-2025

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.17538

Country:

North America > United States > California > San Diego County > Carlsbad (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Fine-tuning LLaMA 2 interference: a comparative study of language implementations for optimal efficiency

Hossain, Sazzad, Seyam, Touhidul Alam, Chowdhury, Avijit, Xamidov, Munis, Ghose, Rajib, Pathak, Abhijit

arXiv.org Artificial IntelligenceJan-30-2025

This paper conducts a comparative investigation to maximize the effectiveness of Llama2 inference, a critical task in machine learning and natural language processing (NLP). Various programming languages and frameworks, including TensorFlow, PyTorch, Python, Mojo, C++, and Java, are examined, assessing their speed, memory consumption, and ease of implementation through extensive testing and benchmarking. The advantages and disadvantages of each strategy are noted, with suggested optimization methods for parallel processing and hardware utilization. Additionally, the performance of the Mojo SDK, a novel framework designed for LLM inference on Apple Silicon, is investigated, comparing it against established implementations in C, C++, Rust, Zig, Go, and Julia. Through comprehensive benchmarking on an Apple M1 Max, Mojo SDK's competitive performance and its advantages in ease of use and Python compatibility are demonstrated, suggesting it is a compelling alternative for LLM inference on Apple Silicon. Implications for the future of LLM deployment on resource-limited hardware and potential avenues for further research is discussed.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.01651

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)
Asia > China (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Mac turns 40: How Apple Silicon cured its midlife crisis

EngadgetJan-24-2024, 16:15:20 GMT

The Mac, formerly the more austere Macintosh, turns 40 today, putting Apple's longest-running product squarely in middle age. But like someone who sees the back half of their life approaching and gets in marathon-runner shape, the Mac is in the strongest place it's been for decades. From a revenue perspective, Mac sales declined precipitously in 2023, but that came on the heels of four years of growth that was likely the product of pent-up demand for an improved Mac lineup. In 2020, Apple finally started delivering on that, thanks in large part to Apple Silicon arriving in the Mac, ushering in the era we're in now. While the Mac was on shaky ground prior to Apple Silicon, it would now be pretty silly to suggest the Mac won't make it to its 50th birthday.

apple, artificial intelligence, mac, (15 more...)

Engadget

Industry: Leisure & Entertainment > Sports (0.35)

Technology:

Information Technology > Artificial Intelligence (0.70)
Information Technology > Hardware (0.48)
Information Technology > Communications > Mobile (0.33)

Add feedback

Get 'ducking' excited: Apple is finally addressing this annoying autocorrect issue

USATODAY - Tech Top StoriesJun-6-2023, 14:51:52 GMT

Apple users who are tired of that "ducking" autocorrect issue can now rejoice! The tech company announced Monday at this year's Worldwide Developers Conference that iOS 17 will ensure that autocorrected words are temporarily underlined so users know what has been changed and can quickly change the word back to what they originally meant to type. "Autocorrect is powered by on-device machine learning and over the years, we've continued to advance these models," said Craig Federighi, the company's software chief. "The keyboard now leverages a transformer language model, which is state of the art for word prediction, making autocorrect more accurate than ever." The autocorrect feature has been the subject of tweets, memes and other social media posts for years, often annoying already irritated people trying to drop a popular expletive by changing the word to "ducking."

artificial intelligence, autocorrect issue, press release, (11 more...)

USATODAY - Tech Top Stories

Genre: Press Release (0.58)

Technology:

Information Technology > Artificial Intelligence (0.56)
Information Technology > Communications (0.53)

Add feedback

Stable Diffusion with Core ML on Apple Silicon - Apple Machine Learning Research

#artificialintelligenceDec-2-2022, 17:36:41 GMT

Today, we are excited to release optimizations to Core ML for Stable Diffusion in macOS 13.1 and iOS 16.2, along with code to get started with deploying to Apple Silicon devices. Since its public debut in August 2022, Stable Diffusion has been adopted by a vibrant community of artists, developers and hobbyists alike, enabling the creation of unprecedented visual content with as little as a text prompt. In response, the community has built an expansive ecosystem of extensions and tools around this core technology in a matter of weeks. There are already methods that personalize Stable Diffusion, extend it to languages other than English, and more, thanks to open-source projects like Hugging Face diffusers. Beyond image generation from text prompts, developers are also discovering other creative uses for Stable Diffusion, such as image editing, in-painting, out-painting, super-resolution, style transfer and even color palette generation.

apple machine learning research, core ml, stable diffusion, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stable Diffusion with Core ML on Apple Silicon

#artificialintelligenceDec-1-2022, 22:15:43 GMT

An increasing number of the machine learning (ML) models we build at Apple each year are either partly or fully adopting the Transformer …

apple silicon, core ml, stable diffusion

#artificialintelligence

Industry: Media > News (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Apple's chips are on the table

#artificialintelligenceMar-13-2022, 05:07:04 GMT

Apple's transition to its own processors is nearly complete. The company's recent spring event saw the debut of the Mac Studio and its M1 Ultra processor -- its most powerful piece of silicon yet. But it also revealed what the future of Apple's computers could look like. For the first time, all of Apple's chips are on the table. The first crucial takeaway is that Apple is now a force to be reckoned with when it comes to chips (if it wasn't already).

apple, computer, processor, (16 more...)

#artificialintelligence

Technology:

Information Technology > Hardware (0.50)
Information Technology > Artificial Intelligence (0.49)

Add feedback